Statistically Significant Pattern Mining With Ordinal Utility
نویسندگان
چکیده
Statistically significant pattern mining (SSPM), which evaluates each via a hypothesis test, is an essential and challenging data task for knowledge discovery. We introduce preference relation between patterns aim to discover the most preferred under constraint of statistical significance, has never been considered in existing SSPM problems. propose iterative multiple testing procedure that can alternately reject safely ignore less useful hypotheses than rejected one. By filtering out with low utility, we avoid significance budget consumption rejecting useless (uninteresting) focus on more patterns, leading discoveries. show proposed method control familywise error rate (FWER) certain assumptions, be satisfied by realistic problem class SSPM. also always discovers equally or Tarone-Bonferroni Subfamily-wise Multiple Testing (SMT). Finally, conducted several experiments both synthetic real-world evaluate performance our method. The discovered many datasets all five tasks.
منابع مشابه
On Mining Statistically Significant Attribute Association Information
Knowledge of the association information between the attributes in a data set provides insight into the underlying structure of the data and explains the relationships (independence, synergy, redundancy) between the attributes. Complex models learnt computationally from the data are more interpretable to a human analyst when such interdependencies are known. In this paper, we focus on mining tw...
متن کاملHigh-Utility Sequential Pattern Mining with Multiple Minimum Utility Thresholds
High-utility sequential pattern mining is an emerging topic in recent decades and most algorithms were designed to identify the complete set of high-utility sequential patterns under the single minimum utility threshold. In this paper, we first propose a novel framework called high-utility sequential pattern mining with multiple minimum utility thresholds to mine high utility sequential pattern...
متن کاملAlgorithms for Efficient Mining of Statistically Significant Attribute Association Information
Knowledge of the association information between the attributes in a data set provides insight into the underlying structure of the data and explains the relationships (independence, synergy, redundancy) between the attributes and class (if present). Complex models learnt computationally from the data are more interpretable to a human analyst when such interdependencies are known. In this paper...
متن کاملMining Statistically Significant Substrings using the Chi-Square Statistic
The problem of identification of statistically significant patterns in a sequence of data has been applied to many domains such as intrusion detection systems, financial models, web-click records, automated monitoring systems, computational biology, cryptology, and text analysis. An observed pattern of events is deemed to be statistically significant if it is unlikely to have occurred due to ra...
متن کاملMining Statistically Significant Patterns using the Chi-Square Statistic
Statistical significance is used to ascertain whether the outcome of a given experiment can be ascribed to some extraneous factors or is solely due to chance. An observed pattern of events is deemed to be statistically significant if it is unlikely to have occurred due to randomness or chance alone. In the thesis, we study the problem of identifying the statistically relevant patterns in string...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Knowledge and Data Engineering
سال: 2022
ISSN: ['1558-2191', '1041-4347', '2326-3865']
DOI: https://doi.org/10.1109/tkde.2022.3208626